Search CORE

47 research outputs found

TensorFlow on state-of-the-art HPC clusters: a machine learning use case

Author: Garcia-Gasulla Marta
Mantovani Filippo
Ramirez-Gargallo Guillem
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

The recent rapid growth of the data-flow programming paradigm enabled the development of specific architectures, e.g., for machine learning. The most known example is the Tensor Processing Unit (TPU) by Google. Standard data-centers, however, still can not foresee large partitions dedicated to machine learning specific architectures. Within data-centers, the High-Performance Computing (HPC) clusters are highly parallel machines targeting a broad class of compute-intensive workflows, as such they can be used for tackling machine learning challenges. On top of this, HPC architectures are rapidly changing, including accelerators and instruction sets other than the classical x86 CPUs. In this blurry scenario, identifying which are the best hardware/software configurations to efficiently support machine learning workloads on HPC clusters is not trivial. In this paper, we considered the workflow of TensorFlow for image recognition. We highlight the strong dependency of the performance in the training phase on the availability of arithmetic libraries optimized for the underlying architecture. Following the example of Intel leveraging the MKL libraries for improving the TensorFlow performance, we plugged the Arm Performance Libraries into TensorFlow and tested on an HPC cluster based on Marvell ThunderX2 CPUs. Also, we performed a scalability study on three state-of-the-art HPC clusters based on different CPU architectures, x86 Intel Skylake, Arm-v8 Marvell ThunderX2, and PowerPC IBM Power9.Postprint (author's final draft

Crossref

UPCommons. Portal del coneixement obert de la UPC

MPI+X: task-based parallelization and dynamic load balance of finite element assembly

Author: Artigues Antoni
Ferrer Roger
Garcia-Gasulla Marta
Houzeaux Guillaume
Labarta Jesús
López Victor
Vázquez Mariano
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2018
Field of study

The main computing tasks of a finite element code(FE) for solving partial differential equations (PDE's) are the algebraic system assembly and the iterative solver. This work focuses on the first task, in the context of a hybrid MPI+X paradigm. Although we will describe algorithms in the FE context, a similar strategy can be straightforwardly applied to other discretization methods, like the finite volume method. The matrix assembly consists of a loop over the elements of the MPI partition to compute element matrices and right-hand sides and their assemblies in the local system to each MPI partition. In a MPI+X hybrid parallelism context, X has consisted traditionally of loop parallelism using OpenMP. Several strategies have been proposed in the literature to implement this loop parallelism, like coloring or substructuring techniques to circumvent the race condition that appears when assembling the element system into the local system. The main drawback of the first technique is the decrease of the IPC due to bad spatial locality. The second technique avoids this issue but requires extensive changes in the implementation, which can be cumbersome when several element loops should be treated. We propose an alternative, based on the task parallelism of the element loop using some extensions to the OpenMP programming model. The taskification of the assembly solves both aforementioned problems. In addition, dynamic load balance will be applied using the DLB library, especially efficient in the presence of hybrid meshes, where the relative costs of the different elements is impossible to estimate a priori. This paper presents the proposed methodology, its implementation and its validation through the solution of large computational mechanics problems up to 16k cores

arXiv.org e-Print Archive

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Computational Fluid and Particle Dynamics Simulations for Respiratory System: Runtime Optimization on an Arm Cluster

Author: Eguzkitza Beatriz
Garcia-Gasulla Marta
Josep-Fabrego Marc
Mantovani Filippo
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 13/08/2018
Field of study

Computational fluid and particle dynamics simulations (CFPD) are of paramount importance for studying and improving drug effectiveness. Computational requirements of CFPD codes involves high-performance computing (HPC) resources. For these reasons we introduce and evaluate in this paper system software techniques for improving performance and tolerate load imbalance on a state-of-the-art production CFPD code. We demonstrate benefits of these techniques on both Intel- and Arm-based HPC clusters showing the importance of using mechanisms applied at runtime to improve the performance independently of the underlying architecture. We run a real CFPD simulation of particle tracking on the human respiratory system, showing performance improvements of up to 2X, keeping the computational resources constant.This work is partially supported by the Spanish Government (SEV-2015-0493), by the Spanish Ministry of Science and Technology project (TIN2015-65316-P), by the Generalitat de Catalunya (2017-SGR-1414), and by the European Mont-Blanc projects (288777, 610402 and 671697).Peer ReviewedPostprint (author's final draft

Crossref

UPCommons. Portal del coneixement obert de la UPC

Dynamic load balancing for hybrid applications

Author: Corbalán González Julita
Garcia Gasulla Marta
Labarta Mancho Jesús José
Publication venue: Barcelona Supercomputing Center
Publication date: 05/05/2015
Field of study

DLB relies on the usage of hybrid programming models and exploits the malleability of the second level of parallelism to redistribute computation power across processes

UPCommons. Portal del coneixement obert de la UPC

Lessons learned from a performance analysis and optimization of a multiscale cellular simulation

Author: Caballero Jose Carbonell
Clascà Marc
Garcia-Gasulla Marta
Montagud Arnau
Valencia Alfonso
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 20/06/2023
Field of study

This work presents a comprehensive performance analysis and optimization of a multiscale agent-based cellular simulation. The optimizations applied are guided by detailed performance analysis and include memory management, load balance, and a locality-aware parallelization. The outcome of this paper is not only the speedup of 2.4x achieved by the optimized version with respect to the original PhysiCell code, but also the lessons learned and best practices when developing parallel HPC codes to obtain efficient and highly performant applications, especially in the computational biology field

arXiv.org e-Print Archive

Runtime Mechanisms to Survive New HPC Architectures: A Use-Case in Human Respiratory Simulations

Author: Eguzkitza Beatriz
Garcia-Gasulla Marta
Houzeaux Guillaume
Josep-Fabrego Marc
Mantovani Filippo
Publication venue: 'SAGE Publications'
Publication date: 01/01/2019
Field of study

Computational Fluid and Particle Dynamics (CFPD) simulations are of paramount importance for studying and improving drug effectiveness. Computational requirements of CFPD codes demand high-performance computing (HPC) resources. For these reasons we introduce and evaluate in this paper system software techniques for improving performance and tolerate load imbalance on a state-of-the-art production CFPD code. We demonstrate benefits of these techniques on Intel-, IBM-, and Arm-based HPC technologies ranked in the Top500 supercomputers, showing the importance of using mechanisms applied at runtime to improve the performance independently of the underlying architecture. We run a real CFPD simulation of particle tracking on the human respiratory system, showing performance improvements of up to 2x, across different architectures, while applying runtime techniques and keeping constant the computational resources.This work is partially supported by the Spanish Government (SEV-2015-0493), by the Spanish Ministry of Science and Technology project (TIN2015-65316-P), by the Generalitat de Catalunya (2017-SGR-1414), and by the European Mont-Blanc projects (288777, 610402 and 671697).Peer ReviewedPreprin

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Containers in HPC: a scalability and portability study in production biological simulations

Author: Garcia Gasulla Marta
Mantovani Filippo
Rudyy Oleksandr
Santiago Alfonso
Sirvent Pardell Raül
Vázquez Mariano
Publication venue: Barcelona Supercomputing Center
Publication date: 07/05/2019
Field of study

Crossref

UPCommons. Portal del coneixement obert de la UPC

Leveraging HPC Profiling & Tracing Tools to Understand the Performance of Particle-in-Cell Monte Carlo Simulations

Author: Costea Stefan
Garcia-Gasulla Marta
Markidis Stefano
Peng Ivy B.
Tskhakaya David
Williams Jeremy J.
Publication venue
Publication date: 28/06/2023
Field of study

Large-scale plasma simulations are critical for designing and developing next-generation fusion energy devices and modeling industrial plasmas. BIT1 is a massively parallel Particle-in-Cell code designed for specifically studying plasma material interaction in fusion devices. Its most salient characteristic is the inclusion of collision Monte Carlo models for different plasma species. In this work, we characterize single node, multiple nodes, and I/O performances of the BIT1 code in two realistic cases by using several HPC profilers, such as perf, IPM, Extrae/Paraver, and Darshan tools. We find that the BIT1 sorting function on-node performance is the main performance bottleneck. Strong scaling tests show a parallel performance of 77% and 96% on 2,560 MPI ranks for the two test cases. We demonstrate that communication, load imbalance and self-synchronization are important factors impacting the performance of the BIT1 on large-scale runs.Comment: Accepted by the Euro-Par 2023 workshops (TDLPP 2023), prepared in the standardized Springer LNCS format and consists of 12 pages, which includes the main text, references, and figure

arXiv.org e-Print Archive

Bronchial Aspirate-Based Profiling Identifies MicroRNA Signatures Associated With COVID-19 and Fatal Disease in Critically Ill Patients

Author: Benítez Iván D.
Garcia Gasulla Dario
González Jessica
Gort Paniello Clara
Molinero Marta
Moncusí Moix Anna
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2022
Field of study

Background: The pathophysiology of COVID-19-related critical illness is not completely understood. Here, we analyzed the microRNA (miRNA) profile of bronchial aspirate (BAS) samples from COVID-19 and non-COVID-19 patients admitted to the ICU to identify prognostic biomarkers of fatal outcomes and to define molecular pathways involved in the disease and adverse events. Methods: Two patient populations were included (n = 89): (i) a study population composed of critically ill COVID-19 and non-COVID-19 patients; (ii) a prospective study cohort composed of COVID-19 survivors and non-survivors among patients assisted by invasive mechanical ventilation (IMV). BAS samples were obtained by bronchoaspiration during the ICU stay. The miRNA profile was analyzed using RT-qPCR. Detailed biomarker and bioinformatics analyses were performed. Results: The deregulation in five miRNA ratios (miR-122-5p/miR-199a-5p, miR-125a-5p/miR-133a-3p, miR-155-5p/miR-486-5p, miR-214-3p/miR-222-3p, and miR-221-3p/miR-27a-3p) was observed when COVID-19 and non-COVID-19 patients were compared. In addition, five miRNA ratios segregated between ICU survivors and nonsurvivors (miR-1-3p/miR-124-3p, miR-125b-5p/miR-34a-5p, miR-126-3p/miR-16-5p, miR-199a-5p/miR-9-5p, and miR-221-3p/miR-491-5p). Through multivariable analysis, we constructed a miRNA ratio-based prediction model for ICU mortality that optimized the best combination of miRNA ratios (miR-125b-5p/miR-34a-5p, miR-199a-5p/miR-9-5p, and miR-221-3p/miR-491-5p). The model (AUC 0.85) and the miR-199a-5p/miR-9-5p ratio (AUC 0.80) showed an optimal discrimination value and outperformed the best clinical predictor for ICU mortality (days from first symptoms to IMV initiation, AUC 0.73). The survival analysis confirmed the usefulness of the miRNA ratio model and the individual ratio to identify patients at high risk of fatal outcomes following IMV initiation. Functional enrichment analyses identified pathological mechanisms implicated in fibrosis, coagulation, viral infections, immune responses and inflammation. Conclusions: COVID-19 induces a specific miRNA signature in BAS from critically ill patients. In addition, specific miRNA ratios in BAS samples hold individual and collective potential to improve risk-based patient stratification following IMV initiation in COVID-19-related critical illness. The biological role of the host miRNA profiles may allow a better understanding of the different pathological axes of the disease.We want particularly to acknowledge the patients, Biobank IdISBa and CIBERES Pulmonary Biobank Consortium (PT17/0015/0001), a member of the Spanish National Biobanks Network financed by the Carlos III Health Institute, with the participation of the Units of Intensive Care, Clinical Analysis and Pulmonology of Hospital Universitario Son Espases and Hospital Son Llatzer for their collaboration. This work was also supported by IRBLleida Biobank (B.0000682) and Plataforma Biobancos PT17/0015/0027/.Peer Reviewed"Article signat per 25 autors/es: Marta Molinero, Iván D. Benítez, Jessica González, Clara Gort-Paniello, Anna Moncusí-Moix, Fátima Rodríguez-Jara, María C. García-Hidalgo, Gerard Torres, J. J. Vengoechea, Silvia Gómez, Ramón Cabo, Jesús Caballero, Jesús F. Bermejo-Martin, Adrián Ceccato, Laia Fernández-Barat, Ricard Ferrer, Dario Garcia-Gasulla, Rosario Menéndez, Ana Motos, Oscar Peñuelas, Jordi Riera, Antoni Torres, Ferran Barbé and David de Gonzalo-Calvo* on behalf of the CIBERESUCICOVID Project (COV20/00110 ISCIII)"Postprint (published version

UPCommons. Portal del coneixement obert de la UPC

Optimization of condensed matter physics application with OpenMP tasking model

Author: Chatterjee Arghya
Criado Joel
Garcia Gasulla Marta
Hernández Óscar
Labarta Mancho Jesús José
Sirvent Raül
Álvarez Gonzalo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

The Density Matrix Renormalization Group (DMRG++) is a condensed matter physics application used to study superconductivity properties of materials. It’s main computations consist of calculating hamiltonian matrix which requires sparse matrix-vector multiplications. This paper presents task-based parallelization and optimization strategies of the Hamiltonian algorithm. The algorithm is implemented as a mini-application in C++ and parallelized with OpenMP. The optimization leverages tasking features, such as dependencies or priorities included in the OpenMP standard 4.5. The code refactoring targets performance as much as programmability. The optimized version achieves a speedup of 8.0 × with 8 threads and 20.5 × with 40 threads on a Power9 computing node while reducing the memory consumption to 90 MB with respect to the original code, by adding less than ten OpenMP directives.This work is partially supported by the Spanish Government through Programa Severo Ochoa (SEV2015-0493), by the Spanish Ministry of Science and Technology (project TIN2015-65316-P), by the Generalitat de Catalunya (contract 2017-SGR-1414) and by the BSC-IBM Deep Learning Research Agreement, under JSA “Application porting, analysis and optimization for POWER and POWER AI”. This work was partially supported by the Scientific Discovery through Advanced Computing (SciDAC) program funded by U.S. Department of Energy, Office of Science, Advanced Scientific Computing Research and Basic Energy Sciences, Division of Materials Sciences and Engineering. This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.Peer ReviewedPostprint (author's final draft

Crossref

UPCommons. Portal del coneixement obert de la UPC